Introduction

In this section we will learn to search and download DNA methylation (epigenetic) and gene expression (transcription) data from the newly created NCI Genomic Data Commons (GDC) portal and prepare them into a Summarized Experiment object.

The figure below hihglights the workflow part which will be covered in this section. Part of the workflow covered in this section

Downloading data

Loading GUI

First we will launch the GUI for TCGAbiolinks.

library(TCGAbiolinksGUI)
TCGAbiolinksGUI()

Gene expression

After launching the GUI select the GDC Data/Get GDC data/Molecular data.

Fill the search fields with the same information below and click on Visualize Data. If you select Filter using clinical data under the clinical filter we will also plot the clinical information.

A plot with the summary of the data will be shown. Also, if you want more details you can also open the GDC search results: Results section.

After the query is completed, you will be able to download the data and convert it to an R object in the Download & Prepare section. If successful it will give you a message where the data was saved. ## Visualizing the Summarized Experiment

The integrative data container SummarizedExperiment object (Morgan M and H., n.d.,Huber et al. (2015)) contains 3 matrices, one with sample metadata, one with features metadata and one with the assay data.

To visualize the SummarizedExperiment object select GDC Data/Manage SummarizedExperiment:

And click on Select Summarized Experiment file. Select the file downloaded from GDC. You can access sample metadata the assay data Accessing assay information from SummarizedExperiment

or the features metadata Accessing features information from SummarizedExperiment

DNA methylation

Again, fill the search fields with the same information below and click on Visualize Data. If you select Filter using clinical data under the clinical filter we will also plot the clinical information.

A plot with the summary of the data will be shown.

After the query is completed, you will be able to download the data and convert it to an R object in the Download & Prepare section.

If successful it will give you a message where the data was saved.

Session Info

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Bioc2017.TCGAbiolinks.ELMER_0.0.0.9000
##  [2] pander_0.6.0                          
##  [3] SummarizedExperiment_1.6.3            
##  [4] DelayedArray_0.2.7                    
##  [5] matrixStats_0.52.2                    
##  [6] Biobase_2.36.2                        
##  [7] GenomicRanges_1.28.3                  
##  [8] GenomeInfoDb_1.12.2                   
##  [9] IRanges_2.10.2                        
## [10] S4Vectors_0.14.3                      
## [11] BiocGenerics_0.22.0                   
## [12] TCGAbiolinks_2.5.6                    
## [13] bindrcpp_0.2                          
## [14] MultiAssayExperiment_1.2.1            
## [15] dplyr_0.7.1                           
## [16] DT_0.2                                
## [17] ELMER_2.0.1                           
## [18] ELMER.data_2.0.1                      
## 
## loaded via a namespace (and not attached):
##   [1] rtracklayer_1.36.3            ggthemes_3.4.0               
##   [3] prabclus_2.2-6                R.methodsS3_1.7.1            
##   [5] tidyr_0.6.3                   ggplot2_2.2.1                
##   [7] acepack_1.4.1                 bit64_0.9-7                  
##   [9] knitr_1.16                    aroma.light_3.6.0            
##  [11] R.utils_2.5.0                 data.table_1.10.4            
##  [13] rpart_4.1-11                  hwriter_1.3.2                
##  [15] RCurl_1.95-4.8                AnnotationFilter_1.0.0       
##  [17] doParallel_1.0.10             GenomicFeatures_1.28.4       
##  [19] RSQLite_2.0                   commonmark_1.2               
##  [21] bit_1.1-12                    BiocStyle_2.4.0              
##  [23] xml2_1.1.1                    httpuv_1.3.5                 
##  [25] assertthat_0.2.0              viridis_0.4.0                
##  [27] hms_0.3                       evaluate_0.10.1              
##  [29] BiocInstaller_1.26.0          DEoptimR_1.0-8               
##  [31] dendextend_1.5.2              km.ci_0.5-2                  
##  [33] DBI_0.7                       geneplotter_1.54.0           
##  [35] htmlwidgets_0.9               reshape_0.8.6                
##  [37] EDASeq_2.10.0                 matlab_1.0.2                 
##  [39] purrr_0.2.2.2                 selectr_0.3-1                
##  [41] ggpubr_0.1.4                  backports_1.1.0              
##  [43] trimcluster_0.1-2             annotate_1.54.0              
##  [45] biomaRt_2.32.1                ensembldb_2.0.3              
##  [47] withr_1.0.2                   Gviz_1.20.0                  
##  [49] BSgenome_1.44.0               robustbase_0.92-7            
##  [51] checkmate_1.8.3               GenomicAlignments_1.12.1     
##  [53] mclust_5.3                    mnormt_1.5-5                 
##  [55] cluster_2.0.6                 lazyeval_0.2.0               
##  [57] genefilter_1.58.1             edgeR_3.18.1                 
##  [59] pkgconfig_2.0.1               labeling_0.3                 
##  [61] nlme_3.1-131                  ProtGenerics_1.8.0           
##  [63] nnet_7.3-12                   devtools_1.13.2              
##  [65] bindr_0.1                     rlang_0.1.1                  
##  [67] diptest_0.75-7                downloader_0.4               
##  [69] AnnotationHub_2.8.2           dichromat_2.0-0              
##  [71] rprojroot_1.2                 Matrix_1.2-10                
##  [73] KMsurv_0.1-5                  zoo_1.8-0                    
##  [75] base64enc_0.1-3               whisker_0.3-2                
##  [77] GlobalOptions_0.0.12          viridisLite_0.2.0            
##  [79] rjson_0.2.15                  bitops_1.0-6                 
##  [81] shinydashboard_0.6.1          R.oo_1.21.0                  
##  [83] ConsensusClusterPlus_1.40.0   Biostrings_2.44.1            
##  [85] blob_1.1.0                    shape_1.4.2                  
##  [87] stringr_1.2.0                 ShortRead_1.34.0             
##  [89] readr_1.1.1                   scales_0.4.1                 
##  [91] memoise_1.1.0                 magrittr_1.5                 
##  [93] plyr_1.8.4                    zlibbioc_1.22.0              
##  [95] compiler_3.4.0                RColorBrewer_1.1-2           
##  [97] Rsamtools_1.28.0              XVector_0.16.0               
##  [99] htmlTable_1.9                 Formula_1.2-2                
## [101] MASS_7.3-47                   stringi_1.1.5                
## [103] yaml_2.1.14                   locfit_1.5-9.1               
## [105] latticeExtra_0.6-28           ggrepel_0.6.5                
## [107] survMisc_0.5.4                grid_3.4.0                   
## [109] VariantAnnotation_1.22.3      tools_3.4.0                  
## [111] rstudioapi_0.6                circlize_0.4.0               
## [113] foreach_1.4.3                 foreign_0.8-69               
## [115] gridExtra_2.2.1               digest_0.6.12                
## [117] shiny_1.0.3                   cmprsk_2.2-7                 
## [119] fpc_2.1-10                    Rcpp_0.12.11                 
## [121] broom_0.4.2                   httr_1.2.1                   
## [123] survminer_0.4.0               AnnotationDbi_1.38.1         
## [125] biovizBase_1.24.0             ComplexHeatmap_1.14.0        
## [127] psych_1.7.5                   kernlab_0.9-25               
## [129] colorspace_1.3-2              rvest_0.3.2                  
## [131] XML_3.98-1.9                  splines_3.4.0                
## [133] flexmix_2.3-14                plotly_4.7.0                 
## [135] xtable_1.8-2                  jsonlite_1.5                 
## [137] UpSetR_1.3.3                  modeltools_0.2-21            
## [139] R6_2.2.2                      Hmisc_4.0-3                  
## [141] htmltools_0.3.6               mime_0.5                     
## [143] glue_1.1.1                    BiocParallel_1.10.1          
## [145] DESeq_1.28.0                  class_7.3-14                 
## [147] interactiveDisplayBase_1.14.0 codetools_0.2-15             
## [149] mvtnorm_1.0-6                 lattice_0.20-35              
## [151] tibble_1.3.3                  curl_2.7                     
## [153] survival_2.41-3               limma_3.32.2                 
## [155] roxygen2_6.0.1                rmarkdown_1.6                
## [157] munsell_0.4.3                 GetoptLong_0.1.6             
## [159] GenomeInfoDbData_0.99.0       iterators_1.0.8              
## [161] reshape2_1.4.2                gtable_0.2.0

Bibliography

Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2). Nature Publishing Group: 115–21.

Morgan M, Hester J, Obenchain V, and Pagès H. n.d. “SummarizedExperiment: SummarizedExperiment Container. R Package Version 1.1.0.” http://bioconductor.org/packages/SummarizedExperiment/.